Neural Machine Translation by Jointly Learning to Align and Translate
Attentionの語が登場するのは3.1のみ(目立たせたく、引用者で斜体にした)
The probability αij , or its associated energy eij , reflects the importance of the annotation hj with respect to the previous hidden state si−1 in deciding the next state si and generating yi.
Intuitively, this implements a mechanism of attention in the decoder.
The decoder decides parts of the source sentence to pay attention to.
By letting the decoder have an attention mechanism, we relieve the encoder from the burden of having to encode all information in the source sentence into a fixed-length vector.